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Motivated by a potential-outcomes perspective, the idea of prin- 
cipal stratification has been widely recognized for its relevance in 
settings susceptible to posttreatment selection bias such as random- 
ized clinical trials where treatment received can differ from treat- 
ment assigned. In one such setting, we address subtleties involved 
in inference for causal effects when using a key covariate to predict 
membership in latent principal strata. We show that when treatment 
received can differ from treatment assigned in both study arms, in- 
corporating a stratum-predictive covariate can make estimates of the 
"compiler average causal effect" (CACE) derive from observations in 
the two treatment arms with different covariate distributions. Adopt- 
ing a Bayesian perspective and using Markov chain Monte Carlo for 
computation, we develop posterior checks that characterize the extent 
to which incorporating the pretreatment covariate endangers estima- 
tion of the CACE. We apply the method to analyze a clinical trial 
comparing two treatments for jaw fractures in which the study proto- 
col allowed surgeons to overrule both possible randomized treatment 
assignments based on their clinical judgment and the data contained 
a key covariate (injury severity) predictive of treatment received. 

1. Introduction. All-or-none treatment noncompliance in the context of 
a randomized two-arm clinical trial is perhaps the simplest and most com- 
mon example in health-sciences research of potential confounding by a post- 
treatment variable. One strategy to address confounding of treatment re- 
ceipt with individual characteristics is the use of an instrumental-variable 
method [McClellan, McNeil and Newhouse (1994)] which has been linked to 
a potential-outcomes perspective on causal inference [Angrist, Imbens and 
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Rubin (1996); Imbens and Rubin (1997); Frangakis and Rubin (1999)]. More 
recently, strategies for addressing potential confounding by posttreatment 
variables have been formalized using the framework of principal stratification 
[Frangakis and Rubin (2002)], a central challenge of which is the classifica- 
tion of patients into latent subclasses, called principal strata, that facilitate 
causal treatment comparisons. 

In the context of treatment noncompliance, the target for inference is of- 
ten the "compiler average causal effect (CACE)" [Imbens and Rubin (1997)], 
which compares treatment outcomes with control outcomes in the principal 
stratum of "compilers" who would potentially receive whichever treatment 
is randomly assigned (as distinct from other principal strata where patients 
may always receive a particular treatment). Such comparisons within princi- 
pal strata are known as "principal effects" and permit causal interpretation. 
Knowledge of membership in the stratum of compilers or in any other princi- 
pal stratum requires knowledge of patients' potential treatment receipts un- 
der both possible randomized assignments, but this information will never 
be observed in total for any individual in the population since treatment 
received is only observed for the actually assigned treatment. 

A battery of now-standard assumptions underlie methods for identifying 
and estimating the CACE in settings framed as treatment noncompliance 
[Angrist, Imbens and Rubin (1996)], but more recent attention [e.g., Hirano 
et al. (2000); Jo and Stuart (2009)] has been paid to the use of pretreatment 
covariates to increase precision or relax exclusion restrictions. One line of re- 
search focuses on settings where patients randomized to the control arm do 
not have access to the active treatment, that is, settings where the entire pop- 
ulation would receive control if so assigned. The key feature of these settings 
is that they allow patients who are assigned and receive active treatment 
to be identified as compilers, which further allows pretreatment covariates 
associated with membership in this stratum to be used in identifying which 
patients randomized to control are exchangeable with compilers. Specifically, 
these settings motivate so-called "two-stage" approaches that first use pre- 
treatment covariates to estimate propensity scores [Rosenbaum and Rubin 
(1983)] of membership in the compiler stratum, then estimate outcomes con- 
ditional on these so-called "principal scores" [Follmann (2000); Hill, Brooks- 
Gunn and Waldfogel (2003); Joffe, Ten Have and Brensinger (2003); Joffe, 
Small and Hsu (2007); Jo and Stuart (2009)]. Although some previous re- 
search has framed the one-sided access to treatment as a nonessential detail 
that merely simplifies exposition, we aim to illuminate that added complex- 
ity can arise in more general settings where noncompliance exists in both 
treatment arms. 

When treatment received can deviate from treatment assigned in both 
study arms, the use of pretreatment covariates to aid estimation of the 
CACE is more complicated because no patient is known to belong to the 
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stratum of compliers, precluding estimation of a model such as a propensity 
score model for membership in the compiler stratum. Joint-estimation meth- 
ods that simultaneously model stratum membership and outcomes have been 
employed in these settings [Hirano et al. (2000); Frangakis, Rubin and Zhou 
(2002); Barnard et al. (2003); Griffin, McCaffrey and Morral (2008); Roy, 
Hogan and Marcus (2008); Gallop et al. (2009)], which typically consist of 
two underlying strata in addition to the compliers: "never-takers" who would 
never receive the active treatment, and "always-takers" who would always 
receive the active treatment. Through use of standard assumptions that will 
be elaborated later, stratum membership for patients who receive a treat- 
ment different from that assigned can be regarded as having been revealed, 
with such individuals being either never-takers or always-takers, and covari- 
ates associated with membership in these two "noncomplier" strata can be 
identified. However, membership in the compiler stratum is never directly 
observed because patients who receive the assigned treatment (and thus 
might be compliers) generally represent mixtures of compliers and never- 
takers (in the control arm) or compliers and always-takers (in the treatment 
arm). Since pretreatment covariates can only provide direct information 
about characteristics of noncompliers, the role of such covariates in esti- 
mating the CACE is to model which patients in the compiler /noncomplier 
mixtures are noncompliers, thus indirectly estimating the remaining portion 
of the mixture to belong to the stratum of compliers. 

In this article we employ a joint-estimation method using a Gibbs sam- 
pling computational approach [Geman and Geman (1984); Gelfand and 
Smith (1990)] in a setting where noncompliance exists in both randomization 
arms. We aim to improve the estimate of the CACE through incorporation of 
a compliance-predictive model that uses a key covariate to select compliers 
from the compiler /noncomplier mixtures. Our novel contribution is a de- 
tailed exposition of scenarios in which observed data predict membership in 
the noncomplier strata in a way that can select compliers in each treatment 
group from different portions of the covariate distribution, potentially im- 
plying that the estimated CACE is biased for the causal effect of treatment. 
After introducing the motivating oral-surgery application in Section 2, Sec- 
tion 3 formally defines a potential-outcomes inference framework and the 
assumptions necessary for estimation of the CACE. Section 4 develops the 
compliance-predictive model and corresponding estimation procedure. Sec- 
tion 5 uses simulated examples to illustrate some posterior checks and illumi- 
nate the potential for bias resulting from the compliance-predictive model, 
and Section 6 illustrates the impact of using the key covariate to predict 
compliance status in the oral-surgery setting. We conclude with a discus- 
sion. 
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2. Motivating oral-surgery clinical trial. Our motivating example con- 
sists of 142 patients who were randomly assigned to receive treatment for 
jaw fractures in the form of Maxillomandibular Fixation (MMF, control) or 
Rigid Internal Fixation (RIF, active treatment). The study aimed to investi- 
gate the putative advantages of the increasingly-popular RIF over the more 
traditional MMF in a patient population thought to be prone to postoper- 
ative complications. A degree of clinical flexibility was deemed essential to 
the protocol, allowing treatment decisions to depart from the randomized 
treatment assignment if deemed necessary by the treating surgeon. This 
clinical latitude gives rise to possible concerns that more severely injured 
patients were disproportionately selected into the more aggressive treatment 
arm, as it is well accepted in the surgical community that the MMF pro- 
cedure, which is less expensive, is appropriate for less severe injuries while 
the RIF procedure, which is more resource-intensive, is appropriate for more 
severe injuries. Although the exact rationale for treatment decisions was not 
recorded, a continuously-scaled measure of injury severity (SEV) was cal- 
culated for each patient. This severity measure, originally developed as the 
Mandible Injury Severity Score (MISS) [Shetty et al. (2007)], ranges from 
(less severe) to 25 (extremely severe), and derives from anatomic and clini- 
cal characteristics of the constituent jaw fractures. The outcome of interest 
was a continuously-scaled General Oral Health Assessment Index (GOHAI) 
[Atchison (1997)] measured at six months post-treatment, with higher values 
suggesting better oral-health quality of life. In the face of "noncompliance" 
(i.e., surgical judgment overriding the treatment assigned through the ran- 
domization protocol), one could conduct intention-to-treat and as-treated 
analyses [Shetty et al. (2008)], but the former addresses a question that 
is arguably not the only scientific question of interest, the latter can give 
rise to bias in estimates of the treatment effect, and neither accounts for 
the plausible effect that subjective treatment decisions had on the analy- 
sis. 

3. Potential outcomes, principal strata and causal estimand. Definition 
of principal strata and causal estimands requires development of a potential- 
outcomes framework, often called the Rubin Causal Model [Rubin (1978a); 
Holland (1986)]. Following previous development in the setting of all-or- 
none treatment noncompliance in a two-arm clinical trial [Angrist, Imbens 
and Rubin (1996)], we define potential outcomes and delineate the prin- 
cipal strata that arise in our motivating setting. We then outline the as- 
sumptions necessary for identifiabihty of the causal estimand of interest, 
the CACE. 

3.1. Potential outcomes and principal strata. First, we define the rele- 
vant potential outcomes inherent in this clinical trial. Define Z as the vec- 
tor of random treatment assignments for all patients in the study, with 
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ith element Zj equal to for assignment to MMF and 1 for assignment 
to RIF. Let D(Z) be a vector with ith element Z?i(Z) denoting the ith 
patient's received treatment under assignment Z. Patients with -Dj(Z) = 
would receive treatment with MMF under assignment Z, while patients with 
-Dj(Z) = 1 would receive treatment with RIF under assignment Z. Further- 
more, we use Yi(Z,D) to denote a patient's potential GOHAI with respect 
to Z and D. We adopt the stable unit treatment value assumption (SUTVA) 
[Rubin (1978a)] here to indicate no interference between patients, allowing 
us to write A(Z) = and Y t (Z, D) = Y^Zi). 

Principal strata in this setting are defined by all four possible values 
of the pair (Dj(0), -Dj(l)). We call the principal stratum of patients with 
(-Di(O) =0,.Dj(l) = 1) "compilers" who will receive the assigned treatment 
regardless of which treatment is assigned, and denote these patients as hav- 
ing Si = c. Similarly, we can call the stratum with (-Dj(O) = 0,Dj(l) = 0) 
"never-takers" who will never receive treatment with RIF, denoting these 
patients with Si = n, and the stratum of patients with (-Di(O) = 1, -Dj(l) = 1) 
"always-takers" who will always receive treatment with RIF, which we label 
with Si = a. Finally, we define the principal stratum of "defiers" as those 
with (.Di(O) = 1,-Dj(l) = 0), or those who will always receive the treatment 
opposite of that assigned, with Si = d. 

Naturally, we observe only one component of (-Dj(O), and only one 

component of (Yi(0), 1^(1)). To draw out this distinction between observed 
and missing components, we write (D£ bs , and (Y° bs , Y™ %s ) , where the 

superscripts obs and mis denote the observed and missing potential out- 
comes, respectively. 

3.2. Assumptions for identifiability of causal estimands. As no complete 
pair of potential outcomes is observable, we require additional assumptions 
for identifiability of causal estimands. In addition to SUTVA, we adopt 
a monotonicity assumption [Imbens and Angrist (1994)] disallowing the ex- 
istence of the principal stratum of defiers, that is, there are no patients 
who would receive MMF if assigned RIF but receive RIF if assigned MMF. 
This setting with noncompliance resulting from clinicians' judgment is un- 
likely to produce a violation of the monotonicity assumption. The usefulness 
of monotonicity lies in its implication that patients with D° bs = Di(0) = 1 
must belong to the stratum of always-takers and those with D° bs = Di(l) = 
must belong to the stratum of never-takers. Stratum membership for those 
who received the assigned treatment remains unidentified, as patients with 
jjobs _ /^(o) = represent a mixture of compilers and never-takers, while 
those with D° bs = -D«(l) = 1 represent a mixture of compilers and always- 
takers. The first three columns of Table 1 provide a summary of the possi- 
ble principal strata for patients with each possible observed pattern of Zi 
and D° bs . 
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Table 1 

Possible principal strata for observed treatment assignment and receipt patterns and 
summary statistics for SEV and GOHAI in the motivating oral-surgery setting 



Treatment 


Treatment 


Possible principal strata 




Mean (SD) 


Mean (SD) 


assigned, Zi 


received, D?*(Zi) 


(r>*(o),£> 4 (i)) 


n 


SEV 


GOHAI 








compilers or never-takcrs 


53 


12.8 (2.7) 


42.8 (12.1) 






(A(0) = 0,A(l) = 0or 1) 











1 


always-takers 


9 


14.0 (2.0) 


42.8 (11.9) 






(A(0) = 1,A(1) = 1) 








1 


1 


compliers or always-takers 


40 


13.2 (2.3) 


44.5 (12.1) 






(A(0) = 1 or 0,A(1) = 1) 








1 





never-takers 


40 


12.2 (3.0) 


41.7 (9.3) 






(A(0) = 0,A(1) = 0) 









Inference for causal effects in clinical trials with treatment noncompliance 
typically relies on another assumption, known as the exclusion restriction 
[Angrist, Imbens and Rubin (1996)], stating that any effect of treatment 
assignment, Z, on the outcome, Y , must be via an effect of treatment recei- 
ved, D. After accounting for received treatment, random assignment no 
longer affects GOHAI, or (Y(z)\Z = z,D(z) = d) = (Y(z)\D(z) = d) for z = 
0,1 and d=0,l. 

With the above development, we define the CACE as the expected differ- 
ence in potential GOHAI outcomes within the stratum of compliers: 

CACE = E[Y i (l)-Y i (0)\S i = c]. 

4. Bayesian models for the CACE with the compliance-predictive feature. 

We formulate our inference strategy with a phenomenological Bayesian model 
following Imbens and Rubin (1997). The model is phenomenological in the 
sense described by Rubin (1978a, 1978b), where the inference builds on po- 
tentially observable quantities even though not all of the quantities will be 
observed. The relevant random variables for each patient are Zi, Dj(0), -Dj(l), 
Yi(0),Yi(l) and Xi, where Xi denotes the zth patient's SEV. We consider 
these random variables realizations from a joint distribution, with Xi, Zi, 
D° bs and Y° bs observed for each patient. Our goal is to model the condi- 
tional distributions of Yi(z) conditional on principal stratum, which requires 
integration over missing values as a result of the unidentifiable mixtures over 
the latent Si. This motivates a Gibbs-sampling strategy that first samples 
the missing Si, thereby allowing assessment of the distributions of Yi(z) 
conditional on the "complete compliance data" consisting of subpopulations 
without mixture components. 



PRINCIPAL EFFECTS WITH COMPLIANCE-PREDICTIVE COVARIATE 7 



4.1. Structure of Bayesian inference. The joint distribution of the data 
can be factored as follows: 

/(Z,(Y(0),Y(1)),(D(0),D(1)),X) = /(Z,Y,S,X) 
(4.1) =/(Y,S,X|Z)/(Z) 

= /(Y,S,X)/(Z), 

where the last equality holds due to randomization in the study design. We 
facilitate Bayesian inference by writing the joint distribution of Y, S and X 
as the product of independently identically distributed random variables 
conditional on a generic parameter 9 [de Finetti (1974)], where we denote 
the prior distribution of 9 as p{9) and the posterior distribution of 9 as 

p{9\Y obs ,-D obs ,X,Z) 

(4.2) 

OC p(0) JjYl tt Y i hS ' ^ ' ^ ' ^ ' ' 

As pointed out in Frangakis, Rubin and Zhou (2002) and Jin and Rubin 
(2008), required integration over D mJS proves computationally difficult in 
general, but as a result of randomization Y m4S can be handled with standard 
randomization-based tools. Furthermore, the difficult integration over D™ s 
leads us to consider the joint posterior of (9,~D miS ), 

(4.3) p(9, T> mts \D ohs , Y o6s , X, Z) a p{9) \\ f(D° bs , D™ is , Y° bs ,Xi\9), 

which is proportional to a standard posterior distribution of 9 had D™ s 
been observed [Jin and Rubin (2008)], further motivating the strategy of 
first drawing Y) ms and then sampling from the posterior distribution of 9 
conditional on complete compliance data. Posterior distributions of the rele- 
vant quantities follow from specification of both p{9) and the models defined 
in Section 4.2. We describe our prior distributions for 9 in Section 4.4. 

4.2. Models for principal strata and outcomes. To estimate the CACE, 
we further factor the joint distribution in (4.1) as /(Y|S,X)/(S|X)/(X)/(Z), 
and specify models for /(YjS,X) and /(S|X). As the population consists 
of three underlying strata, we follow the approach used in Frangakis, Rubin 
and Zhou (2002) and Barnard et al. (2003), whereby we model /(S|X) with 
two linked probit models, the first modeling membership in the never-taker 
stratum and the second modeling membership in the compiler stratum con- 
ditional on exclusion from the never-taker stratum. We parameterize these 
models as 

% l (X i ,(3) = P(S i = n\X h p) = 1 - *(Ax) + A>l*i), 

* c (X i ,p)=P(Si = c\X i ,l3) 

= {l- * n (Xi,p)}{l - $(Ao + IhiXt)} and 

* a {Xi,P) = P(Si = a\Xi,P) = 1 - * n {Xi,P) ~ ^c(XuP), 



(4.4) 
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where (5 = (Ax)j An, PiOi Pn) an d $ is the standard normal cumulative dis- 
tribution function. To facilitate computation, we represent these models as 
arising from underlying continuous random variables £*" and Sf, 



S i = n if Sr = A,o + A)i^ + ^<0, 
(4.5) Si = c if Sf>0 and Sf = (3 W + faXi + U { < and 

5i = a if S™>0 and Sf > 0, 
where the Vi and C/j are independently distributed as N(0, 1). 



We illustrate the analysis with two different models for /(Y|S,X). The 
first model (Model A) entails a regression adjustment for the key covariate's 
association with the outcome: 



/ (Xi (z) \Xi , Si = n) = g n (Yi | oj, a? , X { , a 2 ) ~ N(a% + a^X { , a 2 ) , 
fiYiWlX^Si = a) = g a (Yi\a%,at, X t , a 2 ) ~ iV(ag + a?JQ, a 2 ) and 

(4.6) 

f(Y i (z)\X i ,S i = c,Z i = z) 

= gcz(Yi\al z ,af,X h a 2 ) ~ iV« + af X;, a 2 ) for z = 0, 1, 



implying the exclusion restriction and the assumption that GOHAI out- 
comes are distributed with the same variance in each stratum and for each 
treatment receipt. For comparison purposes, we also conduct the analysis 
under another model (Model B) that does not explicitly incorporate X in 
the model for Y(z), entailing the additional assumption that Y(z) _IL X\S. 
That is, Model B incorporates the restriction that a™ = af = af° = af 1 = 0, 
representing a "standard" unadjusted CACE analysis. 

The observed-data likelihood reflecting the mixtures over the latent Si 
can be written as 



L obs (e\z,-D obs ,Y obs ,x) 



{*„(**,/?) • ffn (F*K, a?, X*, a 2 )} 



Z i= X,Df s =Q 



X {* a (X i ,P)-g a (Yl\c$,C%,X i ,a 2 )} 



Z t =0,D° bs =l 



(4.7) 



x {* n (X i ,P)-g n (Y i \a%,a 1 t,X i ,a 2 ) 



Z i= 0,D? bB =0 



+ ^ c (Xi,p)-g c0 (Yi\a^,af,Xi,a 2 )} 



x {^ a (X u p)-g a (Y l \a a ,a a 1 ,X l ,a 2 ) 



Z t =l,D° bs =l 



+ m c (Xi,p)-g cl {Yi\a$,a?,Xi,a 2 )}, 
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where 6 = (/?oo , An , /^io > /?n , «o > a i > a o > a i > a o° > a i° > "o 1 > c^i 1 > °" 2 ) an d the prod- 
uct over Zi = z,D° bs = d represents the product over all patients assigned 
treatment z who were observed to receive treatment d. 

As a result of random assignment to treatment, the Y™ ls in the stratum of 
compilers is sampled from the distribution g c n-zA an d the CACE estimate 
is calculated as 

CACE = E[Yi{l) - Y i {Q)\X l ,S l = c] = — V (^(1) - ^(0)), 

Sr=C 

where n c is the number of patients with Si = c at the current iteration. 

4.3. Sampling compilers within the compliance-predictive model. Despite 
the existence of three underlying strata, the step of the Gibbs sampler that 
determines patients' unknown compliance status does so via Bernoulli distri- 
butions reflecting the fact that patients who received the assigned treatment 
can belong to one of only two possible strata. Owing to these underlying two- 
component mixtures, the probability at a given iteration of the sampler that 
a patient with Zi = Df bs = z belongs to stratum of compilers is 

P(Si = c\Xi,Yf bs ,D? bs ,Zi,9) 

= V c {X u p)-g cz (Y i \a c z ,a c 1 z ,X i ,o- 2 ) 

(4.8) 

/{^ c (X i ,(3)-g cz {Y i \a c z ,ai z ,X i ,a 2 ) 

+ *t(X i ,P)- gt (Y i \a$ ) ,a t 1 ,X i ,o 2 )), 

where t = n if z = and t = a if z = 1. Examining these probabilities makes 
clear that the relative impacts of Xi and Y° bs on (4.8) depend on the extent 
to which X predicts stratum and on the amount of overlap between the 
distributions g cz and g%. 

4.4. Additional model specifications and statistical computing details. We 
treat the elements of 9 to be a priori independent, using conditionally-con- 
jugate normal distributions for the j3, , a", ag, a", a(f, af, and a conditio- 
nally-conjugate gamma distribution for the precision parameter The dis- 
tributions for (cyq, q.q, otQ Z ) are centered at the overall sample mean GOHAI 
with variances of 100, and the distributions for (a", a", a£ z ) are centered at 
with variances of 100. The prior distribution for the precision parameter is 
gamma with shape and scale parameter set to 0.01. Prior distributions for 
the elements of (3 are centered at with variance 5. 

After a burn-in of 5,000 iterations, each chain is run for 5,000 additional 
iterations, saving every 10th sample. For each model, three chains are run 
from different starting values, and the potential scale-reduction statistics 
[Gelman and Rubin (1992); Gelman et al. (2004)] are calculated for each 
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parameter to assess convergence. All parameters in all models had potential 
scale-reduction statistics less than or equal to 1.06, suggesting satisfactory 
convergence. For each model, the three chains are combined to calculate 
posterior estimates. 

5. Illustration of the potential for Bias in the CACE using simulated 
data. To illustrate that the compliance-predictive model can imply com- 
piler treatment groups with different characteristics and to illustrate our 
graphical diagnostic, we examine in detail a simulated scenario where X is 
predictive of stratum membership and the true CACE = 0. Details of this 
simulation and a broader simulation study appear in a supplementary web 
appendix [Zigler and Belin (2011)]. 

To investigate the relationships between X and stratum membership un- 
der Model A, we examine posterior-predictive distributions of the probabil- 
ities in (4.8) for a hypothetical group of patients having an X distribution 
mirroring that in the observed data. Figure 1(a) displays, for z = 0, 1, his- 
tograms of the observed X distributions in patients with Zi = D° bs = z, with 
histogram bars shaded according to the mean posterior-predictive probabil- 
ity of membership in the compiler stratum for a value of X at that point of 



Z=D=0 




6 8 10 12 14 16 18 20 22 24 CoiT(X Y) 

(a) (b) 

Fig. 1. Results from simulated data sets where X predicts stratum membership. As the 
procedure selects compilers from opposite ends of the severity distribution ( a ), estimates 
of the CACE can become particularly susceptible to model mis specification (b). Thick 
lines in (b) are posterior means and thin lines are 95% posterior intervals. For each 
value of Cow [X, Y), posterior summaries are averaged over 50 Monte Carlo simulations. 
All simulations have CACE — (horizontal dotted line), (a,) Observed SEV distributions 
shaded corresponding to P(Si = c\Xi,Y° ha , D° bs , Zi,6). (b) Posterior CACE estimates un- 
der Models A and B. 
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the histogram and for Y° equal to the mean value observed in patients with 
Zi = D° bs = z. Note the different shading patterns in the two histograms. 
For Zi = D° bs = 0, histo gram bars are darker as X increases (more severely 
injured patients are more likely compilers), while for Z; L = D° bs = 1, his- 
togram bars are lighter as X increases (more severely injured patients are 
less likely compilers). For example, note that patients with X in the range 
[9,12] in the Z { = D° bs = 1 group have probability of membership in the 
compiler stratum near 1.0, while patients with the same range of SEV in 
the Zi = D° bs = group have probability of membership in the compiler 
stratum in the range [0.1 — 0.4]. The implication for estimates of the CACE 
is that over the course of the sampler, patients in the observed mixture of 
compilers and always-takers {Zi = D° bs = 1) with lower X will more often 
contribute to the CACE than patients with comparable X in the observed 
mixture of compilers and never-takers {Zi = D° bs = 0). The opposite sam- 
pling disparity holds for patients with higher X. If X is also related to the 
primary outcome, Y, a situation such as that depicted in Figure 1(a) leaves 
estimates of the CACE particularly vulnerable to misspecified models that 
incorrectly extrapolate to areas of the X distribution where there is limited 
data. For example, estimation of a typical unadjusted CACE (Model B) 
would represent one such misspecified model, and could lead to vastly dif- 
ferent estimates of the CACE. To illustrate this point, Figure 1(b) displays 
posterior estimates of the CACE using both Model A and Model B in scenar- 
ios where X is related to stratum membership and with varying magnitudes 
of the relationship between X and Y. We see that under Model B, the im- 
balanced sampling of compilers evident from Figure 1(a) leads to bias in the 
estimated CACE that is increasing in | Corr(AT, Y)\, providing misleading 
results even when the association between X and Y is modest and in some 
cases estimating a significant treatment effect when there in fact is none. 
The same bias is not depicted under Model A because even though there is 
limited data on comparable compilers in some areas of the X distribution, 
extrapolation of Model A to these areas of the distribution correctly reflects 
the underlying relationship; that is, there is no model misspecification. The 
supplementary web appendix [Zigler and Belin (2011)] considers simulations 
under a broader range of relationships between X and both stratum mem- 
bership and Y and further indicates the potential for bias in the CACE 
when using a compliance-predictive covariate. 

6. Using SEV to predict principal strata in the motivating oral-surgery 
study. As described in Table 1, patients in the oral-surgery example who 
had assignment to MMF overruled (known always-takers) had higher aver- 
age X than the rest of the sample, patients who had assignment to RIF 
overruled (known never-takers) had lower average X than the rest of the 
sample, and there was a relatively high estimated proportion of never-takers 
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Z=D=0 




X 7 9 11 13 15 17 19 

X 

(a) (b) 

Fig. 2. Posterior-predicted probabilities of membership in the complier stratum for hy- 
pothetical patients with Zt = D° bs = z in the oral-surgery study under Model A* , z = 0, 1. 
(&) Posterior mean (solid) and 95% intervals (dashed) for P(Si = c\Xi,Y° bs , D° bs , Zi,B) . 
(b) Observed SEV distributions shaded corresponding to P(Si — c\X i ,Y i obs , D° , Zi,0) . 

(50.0%) and a relatively low proportion of always-takers (14.5%). The oral- 
surgery example had missing Y for a substantial proportion of the patients. 
Based on observed data, the nonresponse rates were 48.4%, 46.2% in the 
Z = 0,1 arms, respectively, and 55.6%, 45.0% in the observed always-takers, 
never-takers, respectively. To prevent complication of our illustrative goal, 
we assume in the models for the oral-surgery data that (1) the Si are inde- 
pendent of the missing indicator and (2) the missing Y are latently ignorable 
conditional on Si and Z\ [Frangakis and Rubin (1999)]. The implication of 
these assumptions for the computation is that missing Y are drawn at each 
iteration from the distribution for patients' current stratum membership 
conditional on current values of the parameters. Furthermore, the small 
number of observed Y values precludes useful estimation of all of the a pa- 
rameters in (4.6), leading us to alter Model A to Model A* that includes the 
constraint that a" = af = a^ = 1 . 

The observed relationship between X and membership in the never-taker 
and always-taker strata (Table 1) prompts examination of the probabilities 
of selection into the stratum of compilers within the compliance-predictive 
model. Figure 2(a) shows the posterior predictive distributions of the Ber- 
noulli probabilities in (4.8) for hypothetical patients with Y equal to the 
observed sample mean, X across the range observed in the data, and Z% = 
jjobs _ z £ or z = 0,1. The unequal proportions of underlying strata are re- 
flected in this figure by the fact that the probability of being sampled as 
a complier is consistently higher for the patients with = D° bs = 1 than 
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for those in the other treatment arm; the low estimated proportion of always- 
takers (9/62 = 0.14) implies that most patients with Zi = D° bs = 1 belong 
to the stratum of compilers. 

The wide spread of the posterior predictive distributions in Figure 2(a) 
suggests that X has limited utility for identifying which patients are com- 
pilers, but there is some indication that the relationship between X and the 
probability of membership in the compiler stratum is slightly different at 
the high end of the X distribution depending on the value of Zi and Df bs . 
To assess the potential for these relationships to affect the sampling of com- 
pilers, we examine in Figure 2(b) the posterior-predictive probabilities of 
membership in the compiler stratum for hypothetical patients with X dis- 
tributions identical to those observed in the sample with Z\ = D° bs = z and 
with Y equal to the mean value observed in patients with Zi = Df bs = z, 
for z = 0, 1. This illustration provides limited evidence that patients with 
different values of Z{ and D° bs are sampled as compilers from different ar- 
eas of their respective X distributions. There is a slight positive association 
between X and membership in the compiler stratum in the Zi = D° bs = 
patients (evidenced by the darkening of the histogram bars as X increases) 
that differs from the negative association in the Zi = D° bs = 1 patients (ev- 
idenced by the lightening of the histogram bars as X increases), but the 
amount of uncertainty in these posterior probabilities likely precludes any 
serious effect on the estimated CACE. 

Overall, the information in Figure 2 does not provide any strong indica- 
tion that the compliance-predictive model estimates a CACE calculated from 
compilers with different injury characteristics in the two treatment groups. 
To explore the sensitivity to alternative models for stratum membership, we 
adapt Model A* to replace the probit models in (4.4) with a multinomial 
logit model along the lines of that used in Hirano et al. (2000), and refer to 
this as Model C*. Using Model C*, figures analogous to Figure 2(a) and (b) 
appear largely indistinguishable from those under Model A* and are not 
pictured. Table 2 summarizes posterior CACE estimates from a compliance- 
predictive analysis under Model A*, Model B and Model C*, as well as 

Table 2 

Posterior estimates of the CACE in the motivating oral-surgery study using a 
model without the compliance-predictive covariate and using three 
compliance-predictive strategies 



Modeling strategy 


Posterior mean 


SD 


2.5% 


97.5% 


Compliance-predictive Model A* 


2.65 


7.3 


-8.9 


20.9 


Compliance-predictive Model B 


0.17 


7.0 


-13.0 


16.0 


Compliance-predictive Model C* 


1.95 


6.7 


-9.6 


18.9 


Model without SEV 


0.74 


6.4 


-11.0 


13.6 
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from an analysis following Imbens and Rubin (1997) that does not explicitly 
use the SEV covariate at all and places a noninformative conditionally- 
conjugate Dirichlet prior distribution on the population proportions of prin- 
cipal strata. None of these models provide evidence of a treatment effect, 
and all three compliance-predictive models offer slightly decreased precision, 
most likely due to the lack of information contained in the SEV covariate 
regarding stratum membership and the inclusion of extraneous model pa- 
rameters. 

7. Discussion. Using covariates to model membership in latent principal 
strata has many advantages in estimating the CACE. We provide a detailed 
illustration of the subtlety involved in using a key covariate when noncom- 
pliance exists in both treatment arms. In particular, we show that when 
a covariate is related to stratum membership, a joint-estimation method 
can imply treatment groups in the latent stratum of compilers with different 
covariate characteristics. The resulting danger of comparing compilers with 
different characteristics can be alleviated with modeling assumptions that 
correctly extrapolate the treatment effect to areas of the covariate distri- 
bution where compilers are not estimated to exist in both treatment arms. 
However, this differential sampling of patients into the compiler stratum 
poses a serious threat to the CACE under model misspecification, includ- 
ing calculation of the standard unadjusted CACE when a covariate predicts 
stratum membership. We propose simple graphical posterior checks that in- 
dicate the extent to which the estimated CACE relies on compilers that have 
different covariate characteristics, potentially characterizing the danger for 
model misspecification to bias the estimated CACE. 

Our aim is not to discourage the use of covariates that are predictive of la- 
tent stratum membership but rather to shed light on the subtleties involved 
and to provide guidance on how to detect whether a compliance-predictive 
model endangers estimates of the CACE. Our motivating oral-surgery ex- 
ample is somewhat unique in its availability of a key covariate that was 
thought to influence the treatment received, but the possibility of covariates 
relating to stratum membership can arise elsewhere, as with the random- 
ized encouragement design considered in Hirano et al. (2000) where age and 
presence of chronic obstructive pulmonary disease (COPD) were thought to 
influence whether patients were in the underlying stratum of individuals who 
would always receive a flu vaccination regardless of random encouragement 
to do so. The authors of that work include compliance-predictive models 
to relax exclusion restrictions and provide posterior estimates of model pa- 
rameters suggestive of a different relationship between age and COPD and 
the probability of membership in the compiler stratum depending on the 
values of Z\ and D° bs . Whether their model tended to consider a compiler 
stratum consisting of younger patients without COPD in the Z = 1 arm and 
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older patients with COPD in the Z = arm could be assessed by examining 
posterior probabilities of stratum membership across the observed ranges of 
these covariates. 

We present models that do and do not adjust the CACE for levels of the 
key covariate. We frame the choice not to model Y conditional on both S 
and X (as in Model B) as a form of model misspecification, but in real 
applications researchers are confronted with the decision to calculate the fa- 
miliar unadjusted CACE or to specify a more detailed model for /(Y|S,X) 
and estimate an adjusted CACE. We show that when a covariate is used to 
model stratum membership, estimation of the unadjusted CACE can pro- 
duce biased results. Thus, we recommend that the CACE be adjusted for any 
covariates used to model stratum membership, which is contrary to previous 
recommendations that stratum-predictive covariates need not be included in 
models for outcomes within strata [Gallop et al. (2009)]. Furthermore, spec- 
ification of a more detailed model for /(Y|S,X) does not guarantee correct- 
ness, and we provide a framework to assess whether model misspecification 
poses a particular danger to estimation of a covariate- adjusted CACE that 
can depend on areas of the covariate distribution where there is limited 
data. 

The core features of the scenario presented here, namely, that a complian- 
ce-predictive model must respect the presence of three underlying strata 
while a patient of unknown stratum can belong to one of only two strata, can 
have conflicting impacts. One way to characterize these issues is to view mod- 
eling membership in the complier stratum not as selection of compilers but 
rather as a process for selection of "nonnoncompliers" from both treatment 
arms since, no matter how predictive, the compliance-predictive feature is 
anchored to observed information on always-takers and never-takers and 
can only indirectly model membership in the stratum of primary interest. 
Some applications focus on treatment effects within principal strata anal- 
ogous to always-takers [Hudgens, Hoering and Self (2003); Gilbert, Bosch 
and Hudgens (2003); Shepherd et al. (2006); Hudgens and Halloran (2006); 
Roy, Hogan and Marcus (2008)] and are less susceptible to the type of bias 
depicted here because, as in settings where noncompliance exists in only one 
treatment arm, the data provide direct evidence on the relationship between 
covariates and the stratum of primary interest. 

We have characterized scenarios that lend themselves to the use of a com- 
pliance-predictive covariate but leave an opening for bias in the estimation 
of the CACE. Such scenarios warrant careful model checking; in Sections 5 
and 6 we propose steps to investigate the potential for bias. Future research 
on methods that use stratum-predictive covariates to estimate the CACE 
when the data consist of three underlying strata would prove valuable in set- 
tings where it is appealing to use covariates to aid identifiability or improve 
precision of causal estimates. 
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SUPPLEMENTARY MATERIAL 

Simulation study (DOI: 10.1214/11-AOAS477SUPP; .pdf). A detailed ex- 
position of the potential for bias using a richer set of simulations. 
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